AITopics

Country: Asia > China > Guangdong Province > Shenzhen (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)

Neural Information Processing SystemsDec-24-2025, 04:02:37 GMT

Weighted Mutual Learning with Diversity-Driven Model Compression

Online distillation attracts attention from the community as it simplifies the traditional two-stage knowledge distillation process into a single stage. Online distillation collaboratively trains a group of peer models, which are treated as students, and all students gain extra knowledge from each other. However, memory consumption and diversity among peers are two key challenges to the scalability and quality of online distillation. To address the two challenges, this paper presents a framework called Weighted Mutual Learning with Diversity-Driven Model Compression (WML) for online distillation. First, at the base of a hierarchical structure where peers share different parts, we leverage the structured network pruning to generate diversified peer models and reduce the memory requirements. Second, rather than taking the average of peers, this paper, for the first time, leverages a bi-level formulation to estimate the relative importance of peers with a close-form, to further boost the effectiveness of the distillation from each other. Extensive experiments show the generalization of the proposed framework, which outperforms existing online distillation methods on a variety of deep neural networks. More interesting, as a byproduct, \WML produces a series of pruned models under different model sizes in a single run, which also achieves competitive results compared with existing channel pruning methods.

diversity-driven model compression, name change, weighted mutual learning, (4 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.60)

Neural Information Processing SystemsNov-20-2025, 18:27:03 GMT

Knowledge Distillation by On-the-Fly Native Ensemble

xu lan, Xiatian Zhu, Shaogang Gong

Neural Information Processing Systems http://nips.cc/

artificial intelligence, distillation, machine learning, (15 more...)

Country:

North America > Canada > Quebec > Montreal (0.04)
Asia > China (0.04)

Industry: Education (0.69)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Neural Information Processing SystemsAug-14-2025, 16:27:31 GMT

4b25c000967af9036fb9b207b198a626-Paper-Conference.pdf

distillation, neural network, online distillation, (13 more...)

Country:

Europe > Latvia > Lubāna Municipality > Lubāna (0.04)
Europe > Denmark > North Jutland > Aalborg (0.04)
Asia > China > Heilongjiang Province > Harbin (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report (0.46)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.73)

arXiv.org Artificial IntelligenceOct-14-2024

Temperature-Centric Investigation of Speculative Decoding with Knowledge Distillation

Ouyang, Siru, Wang, Shuohang, Jiang, Minhao, Zhong, Ming, Yu, Donghan, Han, Jiawei, Shen, Yelong

Speculative decoding stands as a pivotal technique to expedite inference in autoregressive (large) language models. This method employs a smaller draft model to speculate a block of tokens, which the target model then evaluates for acceptance. Despite a wealth of studies aimed at increasing the efficiency of speculative decoding, the influence of generation configurations on the decoding process remains poorly understood, especially concerning decoding temperatures. This paper delves into the effects of decoding temperatures on speculative decoding's efficacy. Beginning with knowledge distillation (KD), we first highlight the challenge of decoding at higher temperatures, and demonstrate KD in a consistent temperature setting could be a remedy. We also investigate the effects of out-of-domain testing sets with out-of-range temperatures. Building upon these findings, we take an initial step to further the speedup for speculative decoding, particularly in a high-temperature generation setting. Our work offers new insights into how generation configurations drastically affect the performance of speculative decoding, and underscores the need for developing methods that focus on diverse decoding configurations. Code is publically available at https://github.com/ozyyshr/TempSpec.

distillation, large language model, machine learning, (18 more...)

2410.10141

Country:

Europe > Austria > Vienna (0.14)
Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > Texas (0.04)
(5 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.68)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Neural Information Processing SystemsOct-10-2024, 23:13:02 GMT

Weighted Mutual Learning with Diversity-Driven Model Compression

distillation, diversity-driven model compression, weighted mutual learning, (1 more...)

Genre: Play > Prospect > Container > Trap (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.40)

arXiv.org Artificial IntelligenceOct-12-2023

Guided Online Distillation: Promoting Safe Reinforcement Learning by Offline Demonstration

Li, Jinning, Liu, Xinyi, Zhu, Banghua, Jiao, Jiantao, Tomizuka, Masayoshi, Tang, Chen, Zhan, Wei

Safe Reinforcement Learning (RL) aims to find a policy that achieves high rewards while satisfying cost constraints. When learning from scratch, safe RL agents tend to be overly conservative, which impedes exploration and restrains the overall performance. In many realistic tasks, e.g. autonomous driving, large-scale expert demonstration data are available. We argue that extracting expert policy from offline data to guide online exploration is a promising solution to mitigate the conserveness issue. Large-capacity models, e.g. decision transformers (DT), have been proven to be competent in offline policy learning. However, data collected in real-world scenarios rarely contain dangerous cases (e.g., collisions), which makes it prohibitive for the policies to learn safety concepts. Besides, these bulk policy networks cannot meet the computation speed requirements at inference time on real-world tasks such as autonomous driving. To this end, we propose Guided Online Distillation (GOLD), an offline-to-online safe RL framework. GOLD distills an offline DT policy into a lightweight policy network through guided online safe RL training, which outperforms both the offline DT policy and online safe RL algorithms. Experiments in both benchmark safe RL tasks and real-world driving tasks based on the Waymo Open Motion Dataset (WOMD) demonstrate that GOLD can successfully distill lightweight policies and solve decision-making problems in challenging safety-critical scenarios.

demonstration, gold, guide policy, (13 more...)

2309.09408

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
North America > United States > California > Alameda County > Berkeley (0.14)

Genre:

Research Report (0.70)
Instructional Material (0.46)

Industry:

Transportation (0.54)
Information Technology (0.54)
Education (0.48)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.34)

Boldo, Michele, Martini, Enrico, De Marchi, Mirco, Aldegheri, Stefano, Bombieri, Nicola

On the Query Strategies for Efficient Online Active Distillation

arXiv.org Artificial IntelligenceSep-4-2023

Deep Learning (DL) requires lots of time and data, resulting in high computational demands. Recently, researchers employ Active Learning (AL) and online distillation to enhance training efficiency and real-time model adaptation. This paper evaluates a set of query strategies to achieve the best training results. It focuses on Human Pose Estimation (HPE) applications, assessing the impact of selected frames during training using two approaches: a classical offline method and a online evaluation through a continual learning approach employing knowledge distillation, on a popular state-of-the-art HPE dataset. The paper demonstrates the possibility of enabling training at the edge lightweight models, adapting them effectively to new contexts in real-time.

dataset, distillation, query strategy, (15 more...)

2309.01612

Country: Europe > Italy (0.14)

Genre: Research Report (0.84)

Industry: Education (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

MacAvaney, Sean, Wang, Xi

Online Distillation for Pseudo-Relevance Feedback

arXiv.org Artificial IntelligenceJun-16-2023

Model distillation has emerged as a prominent technique to improve neural search models. To date, distillation taken an offline approach, wherein a new neural model is trained to predict relevance scores between arbitrary queries and documents. In this paper, we explore a departure from this offline distillation strategy by investigating whether a model for a specific query can be effectively distilled from neural re-ranking results (i.e., distilling in an online setting). Indeed, we find that a lexical model distilled online can reasonably replicate the re-ranking of a neural model. More importantly, these models can be used as queries that execute efficiently on indexes. This second retrieval stage can enrich the pool of documents for re-ranking by identifying documents that were missed in the first retrieval stage. Empirically, we show that this approach performs favourably when compared with established pseudo relevance feedback techniques, dense retrieval methods, and sparse-dense ensemble "hybrid" approaches.

artificial intelligence, machine learning, natural language, (18 more...)

2306.09657

Country:

North America > Canada (0.04)
Oceania > Australia > Queensland (0.04)
North America > United States > Maryland > Montgomery County > Gaithersburg (0.04)
(12 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

arXiv.org Artificial IntelligenceMay-16-2023

Tailoring Instructions to Student's Learning Levels Boosts Knowledge Distillation

Ren, Yuxin, Zhong, Zihan, Shi, Xingjian, Zhu, Yi, Yuan, Chun, Li, Mu

It has been commonly observed that a teacher model with superior performance does not necessarily result in a stronger student, highlighting a discrepancy between current teacher training practices and effective knowledge transfer. In order to enhance the guidance of the teacher training process, we introduce the concept of distillation influence to determine the impact of distillation from each training sample on the student's generalization ability. In this paper, we propose Learning Good Teacher Matters (LGTM), an efficient training technique for incorporating distillation influence into the teacher's learning process. By prioritizing samples that are likely to enhance the student's generalization ability, our LGTM outperforms 10 common knowledge distillation baselines on 6 text classification tasks in the GLUE benchmark.

distillation, machine learning, natural language, (20 more...)

2305.09651

Country:

North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
North America > United States > New York (0.04)
Europe > France (0.04)
(2 more...)

Genre: Research Report (0.82)

Industry: Education > Teacher Education (0.44)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)